Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 79215 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 6.6 MiB |
| Average record size in memory | 88.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 2 |
X_01 is highly correlated with X_06 | High correlation |
X_06 is highly correlated with X_01 and 1 other fields | High correlation |
X_03 is highly correlated with X_06 | High correlation |
X_10 is highly correlated with X_11 | High correlation |
X_11 is highly correlated with X_10 | High correlation |
X_01 is highly correlated with X_05 and 1 other fields | High correlation |
X_05 is highly correlated with X_01 | High correlation |
X_06 is highly correlated with X_01 | High correlation |
X_10 is highly correlated with X_11 | High correlation |
X_11 is highly correlated with X_10 | High correlation |
X_10 is highly correlated with X_11 | High correlation |
X_11 is highly correlated with X_10 | High correlation |
df_index is highly correlated with X_03 | High correlation |
X_01 is highly correlated with X_05 and 1 other fields | High correlation |
X_05 is highly correlated with X_01 | High correlation |
X_06 is highly correlated with X_01 and 1 other fields | High correlation |
X_03 is highly correlated with df_index and 1 other fields | High correlation |
X_10 is highly correlated with X_11 | High correlation |
X_11 is highly correlated with X_10 | High correlation |
X_08 is highly correlated with X_09 | High correlation |
X_09 is highly correlated with X_08 | High correlation |
X_10 is highly skewed (γ1 = 35.00400665) | Skewed |
df_index is uniformly distributed | Uniform |
X_10 has 79150 (99.9%) zeros | Zeros |
Reproduction
| Analysis started | 2022-08-08 03:58:17.725028 |
|---|---|
| Analysis finished | 2022-08-08 03:58:35.283472 |
| Duration | 17.56 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 39608 |
|---|---|
| Distinct (%) | 50.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19803.25 |
| Minimum | 0 |
|---|---|
| Maximum | 39607 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 619.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1980 |
| Q1 | 9901.5 |
| median | 19803 |
| Q3 | 29705 |
| 95-th percentile | 37626.3 |
| Maximum | 39607 |
| Range | 39607 |
| Interquartile range (IQR) | 19803.5 |
Descriptive statistics
| Standard deviation | 11433.77256 |
|---|---|
| Coefficient of variation (CV) | 0.5773684907 |
| Kurtosis | -1.199999999 |
| Mean | 19803.25 |
| Median Absolute Deviation (MAD) | 9902 |
| Skewness | 1.6561712 × 10-9 |
| Sum | 1568714449 |
| Variance | 130731155.1 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2 | < 0.1% |
| 26407 | 2 | < 0.1% |
| 26400 | 2 | < 0.1% |
| 26401 | 2 | < 0.1% |
| 26402 | 2 | < 0.1% |
| 26403 | 2 | < 0.1% |
| 26404 | 2 | < 0.1% |
| 26405 | 2 | < 0.1% |
| 26406 | 2 | < 0.1% |
| 26408 | 2 | < 0.1% |
| Other values (39598) | 79195 |
| Value | Count | Frequency (%) |
| 0 | 2 | |
| 1 | 2 | |
| 2 | 2 | |
| 3 | 2 | |
| 4 | 2 | |
| 5 | 2 | |
| 6 | 2 | |
| 7 | 2 | |
| 8 | 2 | |
| 9 | 2 |
| Value | Count | Frequency (%) |
| 39607 | 1 | |
| 39606 | 2 | |
| 39605 | 2 | |
| 39604 | 2 | |
| 39603 | 2 | |
| 39602 | 2 | |
| 39601 | 2 | |
| 39600 | 2 | |
| 39599 | 2 | |
| 39598 | 2 |
| Distinct | 33 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 68.40402126 |
| Minimum | 53.209 |
|---|---|
| Maximum | 86.859 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 619.0 KiB |
Quantile statistics
| Minimum | 53.209 |
|---|---|
| 5-th percentile | 64.425 |
| Q1 | 66.465 |
| median | 68.504 |
| Q3 | 69.524 |
| 95-th percentile | 73.603 |
| Maximum | 86.859 |
| Range | 33.65 |
| Interquartile range (IQR) | 3.059 |
Descriptive statistics
| Standard deviation | 2.659533657 |
|---|---|
| Coefficient of variation (CV) | 0.03887978525 |
| Kurtosis | 0.7509481017 |
| Mean | 68.40402126 |
| Median Absolute Deviation (MAD) | 2.039 |
| Skewness | 0.4545739545 |
| Sum | 5418624.544 |
| Variance | 7.073119272 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=33)
| Value | Count | Frequency (%) |
| 68.504 | 13039 | |
| 66.465 | 12495 | |
| 69.524 | 11766 | |
| 67.485 | 11220 | |
| 70.544 | 5784 | |
| 71.563 | 5779 | |
| 65.445 | 5493 | |
| 64.425 | 4659 | 5.9% |
| 72.583 | 2420 | 3.1% |
| 73.603 | 2238 | 2.8% |
| Other values (23) | 4322 | 5.5% |
| Value | Count | Frequency (%) |
| 53.209 | 1 | < 0.1% |
| 55.248 | 2 | < 0.1% |
| 56.268 | 1 | < 0.1% |
| 57.287 | 2 | < 0.1% |
| 58.307 | 9 | < 0.1% |
| 59.327 | 19 | < 0.1% |
| 60.347 | 33 | < 0.1% |
| 61.366 | 173 | 0.2% |
| 62.386 | 467 | 0.6% |
| 63.406 | 1543 |
| Value | Count | Frequency (%) |
| 86.859 | 1 | < 0.1% |
| 85.84 | 2 | < 0.1% |
| 84.82 | 3 | < 0.1% |
| 83.8 | 4 | < 0.1% |
| 82.78 | 6 | < 0.1% |
| 81.761 | 5 | < 0.1% |
| 80.741 | 11 | < 0.1% |
| 79.721 | 29 | < 0.1% |
| 78.702 | 66 | |
| 77.682 | 99 |
X_02
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 619.0 KiB |
| 103.32 | |
|---|---|
| 103.321 |
Length
| Max length | 7 |
|---|---|
| Median length | 6 |
| Mean length | 6.165019251 |
| Min length | 6 |
Characters and Unicode
| Total characters | 488362 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 103.32 |
|---|---|
| 2nd row | 103.321 |
| 3rd row | 103.32 |
| 4th row | 103.32 |
| 5th row | 103.32 |
Common Values
| Value | Count | Frequency (%) |
| 103.32 | 66143 | |
| 103.321 | 13072 | 16.5% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 103.32 | 66143 | |
| 103.321 | 13072 | 16.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 158430 | |
| 1 | 92287 | |
| 0 | 79215 | |
| . | 79215 | |
| 2 | 79215 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 409147 | |
| Other Punctuation | 79215 | 16.2% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 158430 | |
| 1 | 92287 | |
| 0 | 79215 | |
| 2 | 79215 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 79215 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 488362 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 158430 | |
| 1 | 92287 | |
| 0 | 79215 | |
| . | 79215 | |
| 2 | 79215 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 488362 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 158430 | |
| 1 | 92287 | |
| 0 | 79215 | |
| . | 79215 | |
| 2 | 79215 |
| Distinct | 462 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 102.3372487 |
| Minimum | 101.734 |
|---|---|
| Maximum | 103.161 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 619.0 KiB |
Quantile statistics
| Minimum | 101.734 |
|---|---|
| 5-th percentile | 101.889 |
| Q1 | 101.949 |
| median | 102.007 |
| Q3 | 103.144 |
| 95-th percentile | 103.157 |
| Maximum | 103.161 |
| Range | 1.427 |
| Interquartile range (IQR) | 1.195 |
Descriptive statistics
| Standard deviation | 0.5481525175 |
|---|---|
| Coefficient of variation (CV) | 0.005356334323 |
| Kurtosis | -1.337063559 |
| Mean | 102.3372487 |
| Median Absolute Deviation (MAD) | 0.077 |
| Skewness | 0.7933924669 |
| Sum | 8106645.152 |
| Variance | 0.3004711824 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 103.157 | 2498 | 3.2% |
| 103.158 | 2022 | 2.6% |
| 103.156 | 1789 | 2.3% |
| 103.155 | 1634 | 2.1% |
| 103.154 | 1618 | 2.0% |
| 103.153 | 1586 | 2.0% |
| 103.152 | 1373 | 1.7% |
| 103.151 | 1195 | 1.5% |
| 103.15 | 1088 | 1.4% |
| 103.149 | 898 | 1.1% |
| Other values (452) | 63514 |
| Value | Count | Frequency (%) |
| 101.734 | 1 | < 0.1% |
| 101.754 | 1 | < 0.1% |
| 101.774 | 1 | < 0.1% |
| 101.778 | 1 | < 0.1% |
| 101.782 | 2 | |
| 101.786 | 1 | < 0.1% |
| 101.787 | 1 | < 0.1% |
| 101.788 | 3 | |
| 101.789 | 2 | |
| 101.79 | 4 |
| Value | Count | Frequency (%) |
| 103.161 | 5 | < 0.1% |
| 103.16 | 486 | 0.6% |
| 103.159 | 862 | 1.1% |
| 103.158 | 2022 | |
| 103.157 | 2498 | |
| 103.156 | 1789 | |
| 103.155 | 1634 | |
| 103.154 | 1618 | |
| 103.153 | 1586 | |
| 103.152 | 1373 |
| Distinct | 26 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 70.59093362 |
| Minimum | 61.726 |
|---|---|
| Maximum | 87.219 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 619.0 KiB |
Quantile statistics
| Minimum | 61.726 |
|---|---|
| 5-th percentile | 66.825 |
| Q1 | 68.864 |
| median | 69.884 |
| Q3 | 71.923 |
| 95-th percentile | 73.963 |
| Maximum | 87.219 |
| Range | 25.493 |
| Interquartile range (IQR) | 3.059 |
Descriptive statistics
| Standard deviation | 2.255384568 |
|---|---|
| Coefficient of variation (CV) | 0.03195006005 |
| Kurtosis | 0.6872052854 |
| Mean | 70.59093362 |
| Median Absolute Deviation (MAD) | 1.02 |
| Skewness | 0.4405082341 |
| Sum | 5591860.807 |
| Variance | 5.086759552 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=26)
| Value | Count | Frequency (%) |
| 69.884 | 17043 | |
| 71.923 | 12720 | |
| 68.864 | 12426 | |
| 70.904 | 11011 | |
| 67.845 | 6443 | 8.1% |
| 72.943 | 5614 | 7.1% |
| 73.963 | 5128 | 6.5% |
| 66.825 | 3766 | 4.8% |
| 74.983 | 2405 | 3.0% |
| 65.805 | 761 | 1.0% |
| Other values (16) | 1898 | 2.4% |
| Value | Count | Frequency (%) |
| 61.726 | 5 | < 0.1% |
| 62.746 | 8 | < 0.1% |
| 63.766 | 42 | 0.1% |
| 64.785 | 310 | 0.4% |
| 65.805 | 761 | 1.0% |
| 66.825 | 3766 | 4.8% |
| 67.845 | 6443 | 8.1% |
| 68.864 | 12426 | |
| 69.884 | 17043 | |
| 70.904 | 11011 |
| Value | Count | Frequency (%) |
| 87.219 | 2 | < 0.1% |
| 86.2 | 2 | < 0.1% |
| 85.18 | 3 | < 0.1% |
| 84.16 | 5 | < 0.1% |
| 83.14 | 6 | < 0.1% |
| 82.121 | 5 | < 0.1% |
| 81.101 | 13 | < 0.1% |
| 80.081 | 34 | < 0.1% |
| 79.062 | 93 | |
| 78.042 | 146 |
| Distinct | 298 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 68.83213722 |
| Minimum | 55.57 |
|---|---|
| Maximum | 89.17 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 619.0 KiB |
Quantile statistics
| Minimum | 55.57 |
|---|---|
| 5-th percentile | 62.77 |
| Q1 | 65.07 |
| median | 67.27 |
| Q3 | 71.77 |
| 95-th percentile | 80.27 |
| Maximum | 89.17 |
| Range | 33.6 |
| Interquartile range (IQR) | 6.7 |
Descriptive statistics
| Standard deviation | 5.178511143 |
|---|---|
| Coefficient of variation (CV) | 0.07523391474 |
| Kurtosis | 0.2512423422 |
| Mean | 68.83213722 |
| Median Absolute Deviation (MAD) | 2.8 |
| Skewness | 0.9801863773 |
| Sum | 5452537.75 |
| Variance | 26.81697766 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 65.17 | 1019 | 1.3% |
| 65.77 | 1007 | 1.3% |
| 65.87 | 990 | 1.2% |
| 66.07 | 988 | 1.2% |
| 65.07 | 970 | 1.2% |
| 66.37 | 968 | 1.2% |
| 65.57 | 956 | 1.2% |
| 65.97 | 951 | 1.2% |
| 65.67 | 951 | 1.2% |
| 65.37 | 950 | 1.2% |
| Other values (288) | 69465 |
| Value | Count | Frequency (%) |
| 55.57 | 1 | < 0.1% |
| 56.47 | 1 | < 0.1% |
| 56.77 | 2 | |
| 56.87 | 1 | < 0.1% |
| 56.97 | 1 | < 0.1% |
| 57.07 | 1 | < 0.1% |
| 57.27 | 1 | < 0.1% |
| 57.47 | 1 | < 0.1% |
| 57.67 | 1 | < 0.1% |
| 57.77 | 3 |
| Value | Count | Frequency (%) |
| 89.17 | 1 | < 0.1% |
| 88.67 | 1 | < 0.1% |
| 87.77 | 1 | < 0.1% |
| 87.67 | 1 | < 0.1% |
| 87.57 | 1 | < 0.1% |
| 87.17 | 1 | < 0.1% |
| 86.87 | 3 | |
| 86.77 | 1 | < 0.1% |
| 86.67 | 2 | |
| 86.57 | 1 | < 0.1% |
X_10
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.002504576154 |
| Minimum | 0 |
|---|---|
| Maximum | 3.6 |
| Zeros | 79150 |
| Zeros (%) | 99.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 619.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 3.6 |
| Range | 3.6 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.0875087213 |
|---|---|
| Coefficient of variation (CV) | 34.93953305 |
| Kurtosis | 1226.741262 |
| Mean | 0.002504576154 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 35.00400665 |
| Sum | 198.4 |
| Variance | 0.007657776303 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=9)
| Value | Count | Frequency (%) |
| 0 | 79150 | |
| 3 | 20 | < 0.1% |
| 2.9 | 17 | < 0.1% |
| 3.1 | 14 | < 0.1% |
| 3.3 | 6 | < 0.1% |
| 3.2 | 5 | < 0.1% |
| 3.6 | 1 | < 0.1% |
| 3.5 | 1 | < 0.1% |
| 2.8 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 79150 | |
| 2.8 | 1 | < 0.1% |
| 2.9 | 17 | < 0.1% |
| 3 | 20 | < 0.1% |
| 3.1 | 14 | < 0.1% |
| 3.2 | 5 | < 0.1% |
| 3.3 | 6 | < 0.1% |
| 3.5 | 1 | < 0.1% |
| 3.6 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 3.6 | 1 | < 0.1% |
| 3.5 | 1 | < 0.1% |
| 3.3 | 6 | < 0.1% |
| 3.2 | 5 | < 0.1% |
| 3.1 | 14 | < 0.1% |
| 3 | 20 | < 0.1% |
| 2.9 | 17 | < 0.1% |
| 2.8 | 1 | < 0.1% |
| 0 | 79150 |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 619.0 KiB |
| 0.0 | |
|---|---|
| 0.5 | 25 |
| 0.6 | 24 |
| 0.4 | 6 |
| 0.7 | 1 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 237645 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 79159 | |
| 0.5 | 25 | < 0.1% |
| 0.6 | 24 | < 0.1% |
| 0.4 | 6 | < 0.1% |
| 0.7 | 1 | < 0.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0.0 | 79159 | |
| 0.5 | 25 | < 0.1% |
| 0.6 | 24 | < 0.1% |
| 0.4 | 6 | < 0.1% |
| 0.7 | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 158374 | |
| . | 79215 | |
| 5 | 25 | < 0.1% |
| 6 | 24 | < 0.1% |
| 4 | 6 | < 0.1% |
| 7 | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 158430 | |
| Other Punctuation | 79215 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 158374 | |
| 5 | 25 | < 0.1% |
| 6 | 24 | < 0.1% |
| 4 | 6 | < 0.1% |
| 7 | 1 | < 0.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 79215 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 237645 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 158374 | |
| . | 79215 | |
| 5 | 25 | < 0.1% |
| 6 | 24 | < 0.1% |
| 4 | 6 | < 0.1% |
| 7 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 237645 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 158374 | |
| . | 79215 | |
| 5 | 25 | < 0.1% |
| 6 | 24 | < 0.1% |
| 4 | 6 | < 0.1% |
| 7 | 1 | < 0.1% |
X_07
Real number (ℝ≥0)
| Distinct | 1684 |
|---|---|
| Distinct (%) | 2.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.43642643 |
| Minimum | 13.39 |
|---|---|
| Maximum | 163.86 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 619.0 KiB |
Quantile statistics
| Minimum | 13.39 |
|---|---|
| 5-th percentile | 26.17 |
| Q1 | 27.89 |
| median | 28.84 |
| Q3 | 29.87 |
| 95-th percentile | 32.7 |
| Maximum | 163.86 |
| Range | 150.47 |
| Interquartile range (IQR) | 1.98 |
Descriptive statistics
| Standard deviation | 7.608329676 |
|---|---|
| Coefficient of variation (CV) | 0.2584664851 |
| Kurtosis | 282.2671693 |
| Mean | 29.43642643 |
| Median Absolute Deviation (MAD) | 0.98 |
| Skewness | 16.24178307 |
| Sum | 2331806.52 |
| Variance | 57.88668045 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 28.82 | 278 | 0.4% |
| 28.86 | 272 | 0.3% |
| 28.63 | 270 | 0.3% |
| 28.74 | 268 | 0.3% |
| 29.06 | 261 | 0.3% |
| 28.92 | 255 | 0.3% |
| 28.78 | 255 | 0.3% |
| 28.71 | 254 | 0.3% |
| 28.53 | 253 | 0.3% |
| 28.49 | 253 | 0.3% |
| Other values (1674) | 76596 |
| Value | Count | Frequency (%) |
| 13.39 | 1 | |
| 14.14 | 1 | |
| 15.01 | 1 | |
| 22.48 | 1 | |
| 23.25 | 1 | |
| 23.46 | 1 | |
| 23.92 | 1 | |
| 23.95 | 2 | |
| 23.96 | 1 | |
| 23.97 | 1 |
| Value | Count | Frequency (%) |
| 163.86 | 203 | |
| 163.85 | 1 | < 0.1% |
| 163.81 | 2 | < 0.1% |
| 163.79 | 1 | < 0.1% |
| 163.78 | 2 | < 0.1% |
| 163.77 | 1 | < 0.1% |
| 163.73 | 2 | < 0.1% |
| 163.69 | 1 | < 0.1% |
| 163.65 | 2 | < 0.1% |
| 163.64 | 1 | < 0.1% |
| Distinct | 19711 |
|---|---|
| Distinct (%) | 24.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 163.9109997 |
| Minimum | 28.59 |
|---|---|
| Maximum | 2387.44 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 619.0 KiB |
Quantile statistics
| Minimum | 28.59 |
|---|---|
| 5-th percentile | 74.07 |
| Q1 | 105.87 |
| median | 115.04 |
| Q3 | 132.11 |
| 95-th percentile | 328.026 |
| Maximum | 2387.44 |
| Range | 2358.85 |
| Interquartile range (IQR) | 26.24 |
Descriptive statistics
| Standard deviation | 219.8447029 |
|---|---|
| Coefficient of variation (CV) | 1.341244354 |
| Kurtosis | 51.1574496 |
| Mean | 163.9109997 |
| Median Absolute Deviation (MAD) | 10.79 |
| Skewness | 6.697248083 |
| Sum | 12984209.84 |
| Variance | 48331.6934 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 115.66 | 49 | 0.1% |
| 116.11 | 47 | 0.1% |
| 113.02 | 46 | 0.1% |
| 114.17 | 45 | 0.1% |
| 111.77 | 45 | 0.1% |
| 114.34 | 45 | 0.1% |
| 115.96 | 44 | 0.1% |
| 113.97 | 44 | 0.1% |
| 115.29 | 42 | 0.1% |
| 114.52 | 42 | 0.1% |
| Other values (19701) | 78766 |
| Value | Count | Frequency (%) |
| 28.59 | 1 | |
| 31.88 | 1 | |
| 38.46 | 1 | |
| 42.4 | 1 | |
| 42.41 | 1 | |
| 42.44 | 1 | |
| 42.53 | 1 | |
| 42.64 | 1 | |
| 42.74 | 1 | |
| 42.85 | 1 |
| Value | Count | Frequency (%) |
| 2387.44 | 15 | |
| 2387.43 | 4 | < 0.1% |
| 2387.42 | 7 | |
| 2387.41 | 1 | < 0.1% |
| 2387.38 | 3 | < 0.1% |
| 2387.36 | 2 | < 0.1% |
| 2387.33 | 1 | < 0.1% |
| 2387.3 | 2 | < 0.1% |
| 2387.26 | 1 | < 0.1% |
| 2387.24 | 1 | < 0.1% |
| Distinct | 13261 |
|---|---|
| Distinct (%) | 16.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 225.2985645 |
| Minimum | 37.58 |
|---|---|
| Maximum | 637.54 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 619.0 KiB |
Quantile statistics
| Minimum | 37.58 |
|---|---|
| 5-th percentile | 109.9 |
| Q1 | 188.48 |
| median | 234.69 |
| Q3 | 263.98 |
| 95-th percentile | 300.44 |
| Maximum | 637.54 |
| Range | 599.96 |
| Interquartile range (IQR) | 75.5 |
Descriptive statistics
| Standard deviation | 66.50051019 |
|---|---|
| Coefficient of variation (CV) | 0.2951661513 |
| Kurtosis | 5.453370941 |
| Mean | 225.2985645 |
| Median Absolute Deviation (MAD) | 36.8 |
| Skewness | 0.49983987 |
| Sum | 17847025.79 |
| Variance | 4422.317855 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 37.58 | 1728 | 2.2% |
| 250.38 | 30 | < 0.1% |
| 255.71 | 28 | < 0.1% |
| 254.63 | 27 | < 0.1% |
| 253.34 | 26 | < 0.1% |
| 263.41 | 26 | < 0.1% |
| 260.39 | 24 | < 0.1% |
| 251.76 | 24 | < 0.1% |
| 259.16 | 24 | < 0.1% |
| 255.47 | 24 | < 0.1% |
| Other values (13251) | 77254 |
| Value | Count | Frequency (%) |
| 37.58 | 1728 | |
| 87.61 | 1 | < 0.1% |
| 87.68 | 2 | < 0.1% |
| 87.76 | 1 | < 0.1% |
| 87.81 | 2 | < 0.1% |
| 87.89 | 1 | < 0.1% |
| 87.91 | 1 | < 0.1% |
| 87.93 | 1 | < 0.1% |
| 87.94 | 1 | < 0.1% |
| 87.96 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 637.54 | 1 | |
| 637.49 | 1 | |
| 637.45 | 1 | |
| 637.28 | 1 | |
| 636.57 | 1 | |
| 635.75 | 1 | |
| 635.6 | 1 | |
| 635.27 | 2 | |
| 635.12 | 1 | |
| 634.91 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | X_01 | X_02 | X_05 | X_06 | X_03 | X_10 | X_11 | X_07 | X_08 | X_09 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 70.544 | 103.320 | 101.892 | 74.983 | 67.47 | 0.0 | 0.0 | 29.45 | 62.38 | 245.71 |
| 1 | 1 | 69.524 | 103.321 | 101.944 | 72.943 | 65.17 | 0.0 | 0.0 | 28.73 | 61.23 | 233.61 |
| 2 | 2 | 72.583 | 103.320 | 103.153 | 72.943 | 64.07 | 0.0 | 0.0 | 28.81 | 105.77 | 272.20 |
| 3 | 3 | 71.563 | 103.320 | 101.971 | 77.022 | 67.57 | 0.0 | 0.0 | 28.92 | 115.21 | 255.36 |
| 4 | 4 | 69.524 | 103.320 | 101.981 | 70.904 | 63.57 | 0.0 | 0.0 | 29.68 | 103.38 | 241.46 |
| 5 | 5 | 69.524 | 103.320 | 101.899 | 69.884 | 62.77 | 0.0 | 0.0 | 27.90 | 64.97 | 241.85 |
| 6 | 6 | 71.563 | 103.320 | 101.921 | 73.963 | 66.07 | 0.0 | 0.0 | 29.30 | 69.22 | 237.51 |
| 7 | 7 | 69.524 | 103.320 | 101.968 | 73.963 | 65.47 | 0.0 | 0.0 | 29.77 | 71.41 | 238.27 |
| 8 | 8 | 71.563 | 103.320 | 101.996 | 72.943 | 66.27 | 0.0 | 0.0 | 29.76 | 68.75 | 232.23 |
| 9 | 9 | 71.563 | 103.320 | 101.990 | 77.022 | 68.97 | 0.0 | 0.0 | 28.97 | 66.88 | 228.22 |
Last rows
| df_index | X_01 | X_02 | X_05 | X_06 | X_03 | X_10 | X_11 | X_07 | X_08 | X_09 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 79205 | 39598 | 68.504 | 103.321 | 102.053 | 68.864 | 64.67 | 0.0 | 0.0 | 31.75 | 115.40 | 191.02 |
| 79206 | 39599 | 67.485 | 103.320 | 102.038 | 70.904 | 67.07 | 0.0 | 0.0 | 30.74 | 119.53 | 197.42 |
| 79207 | 39600 | 69.524 | 103.320 | 103.143 | 69.884 | 63.87 | 0.0 | 0.0 | 29.80 | 114.08 | 198.80 |
| 79208 | 39601 | 68.504 | 103.320 | 103.135 | 68.864 | 62.77 | 0.0 | 0.0 | 31.99 | 117.76 | 292.87 |
| 79209 | 39602 | 67.485 | 103.320 | 102.050 | 69.884 | 64.97 | 0.0 | 0.0 | 29.59 | 114.22 | 291.71 |
| 79210 | 39603 | 68.504 | 103.320 | 103.157 | 68.864 | 63.97 | 0.0 | 0.0 | 29.49 | 116.35 | 284.16 |
| 79211 | 39604 | 68.504 | 103.320 | 103.137 | 68.864 | 61.37 | 0.0 | 0.0 | 32.29 | 116.28 | 272.41 |
| 79212 | 39605 | 69.524 | 103.320 | 103.149 | 69.884 | 63.67 | 0.0 | 0.0 | 30.00 | 113.05 | 295.54 |
| 79213 | 39606 | 67.485 | 103.321 | 103.148 | 67.845 | 61.77 | 0.0 | 0.0 | 32.05 | 115.05 | 267.26 |
| 79214 | 39607 | 71.563 | 103.320 | 103.158 | 71.923 | 63.07 | 0.0 | 0.0 | 31.14 | 102.22 | 215.85 |